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1 . Foreword 

In recent years a number of authors have noted how Charles Stein's character- 
ization of the Gaussian (see [11]) and the so-called "magic factors" crop up in 
matters related to information theory (see [5], [6], [7], [3] or [1] and the references 
therein). The purpose of this note is to make this connection explicit. 

2. Results 

We consider densities p : M — > M. + whose support is an interval S := S p with 
closure S — [a, b], for some — oo < a < b < oo. Among these wc denote by Q the 
collection of densities which are (strongly) diffcrcntiablc at every point in the 
interior of their support. 

Definition 2.1. Fix p € Q with support S and define F(p) the collection of 
test functions / : M — > K such that the mapping x t— > f(x)p(x) is bounded on R 
and strongly differ entiable on the interior of S . 

Take a real bounded function h with support S, and suppose that h is 
(strongly) diffcrcntiablc on the interior of S. Then h can be written as his with 
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h a diffcrcntiablc continuation of h on R. In the sequel we will write d y h{y)\ y=x 
for the differential in the sense of distributions of h evaluated at x, so that 
dyh(y)\ y=x = (h)' (x)Is(x) + h(x) (Sj x — a \ — 6{ x =b}) where S represents a Dirac 
delta. 

Definition 2.2. Let R* be the collection of all functions f : R — > R. We define 
the (location-based) Stein operator as the operator T : R* x Q — > R* : (f,p) h-> 
T(f,p) such that 

T ,, , D _ d y {f{y)p{y))\ 

T(f,p) :R^R: x^ — (2.1 

p(x) 

/or all f for which the differential ( in the sense of distributions ) exists. 

Remark 2.1. The terminology "location-based" Stein operator is inherited from 
our parametric approach to Stein characterizations (see [8]), where a much more 
general characterization result is proposed. 

To avoid ambiguities related to division by 0, throughout this paper we adopt 
the convention that, whenever an expression involves the division by an indicator 
function 1^4 for some measurable set A, we are multiplying the expression by the 
said indicator function. This convention ensures that for p € Q and / G F(p) 
and for any continuous random variable X, the quantity T(f,p)(X) is well- 
defined. We further draw the reader's attention to the fact that, in particular, 
ratios p(x)/p(x) do not necessarily simplify to 1. 

Example 2.1. It is perhaps informative to see how Definitions 2.1 and 2.2 spell 
out for different explicit choices of target densities. 

1. If p — 4>, the standard Gaussian, then F{4>) contains the set of all differ- 
entiable bounded functions and 

T(/,0)(a;) = /'(*) -*/(*), 

which is Stein's well-known operator for characterizing the Gaussian. 

2. If p(x) = e~ x I[o j00 )(a;) ; the exponential Exp, then (abusing notations) 
T{Exp) contains the set of all differentiable bounded functions and 

T(f,Exp)(x) = (f'(x) - f(x) + f(x)S {x=a} ) I [0lO o)(s). 

3. If p(x) = I[o.i](ic); the standard uniform t/(0, 1), then !F(U(0, 1)) contains 
the set of all differentiable bounded functions and 

T(f, U(0, l))(x) = (f(x) + f(x)(S {x=0} - 5 {x=1} )) l m {x). 



4- If p(x) = j-v4 — £ 2 I(_2,2) ( x )i Wigner's semicircle law SC, then J-(SC) 
contains the set of all functions of the form f(x) = fo(x)(A — x 2 ) for some 
bounded differentiable fo and, for these f , the operator becomes 

T(f,SC)(x) - ((4-^)/5(z)-3z/o(z))l(_ a , 2 )(a:). 
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5. Iff(x) = —j = = I(Q t i)(x), the arcsine distribution AS , then J 7 (AS) con- 
tains the collection of all functions of the form fix) — fo(x)y/ x(l — x) for 
some bounded differ entiable fo and, for these f , the operator becomes 

T(f,AS)(x) = ^x(l-x)f^x)I {0>1) (x). 

6. If p(x) is a member of Pearson's family of distributions and thus satisfies 

(s(x)p(x))' = t(x)p(x) 

for t a polynomial of exact degree one and s a polynomial of degree at most 
two, then, abusing notations one last time, we easily see that J r (P(s,r)) 
contains the set of all functions of the form fix) = fo(x)s(x) for fo 
bounded, differentiable such that /(a + ) = f(b~) = and, for these f, 
the operator becomes 

T(f,P(s,T))(x) = (s(x)f Q (x) + r(x)f (x))ls(x). 

The first three operators are well-known and can be found, for instance, in [12]. 
The fourth example can be found in [4]. The last example comes from [9]. 

We are now ready to state and prove our first main result. 

Theorem 2.1 (Density approach). Let p G Q with support S, and take Z ~ p. 
Let J^ip) be as in Definition 2.1 and T as in Definition 2.2. Let X be a real- 
valued continuous random variable. 

(1) If X = Z then E [T{f,p){X)\ = for all f G F{p). 

(2) IfE[T(f,p)(X)} = for all f G T(p), then X\X eS = Z. 

Proof. To see (1), note that the hypotheses on / and p guarantee that we have 
E[T(f,p)(Z)} = [f(y)p(y)] b a + f(a+)p(a+)-f(b-)p(b-) = 0. To sec (2), consider 
for z G M the functions /| defined through 

i r x 

fl : R -> K : x h-> — — / l z {u)p{u)du 
P{x) J a 

with l z {u) := (I(_oo,,](w) - P P (X < z))I S (u) and P P (X < z) := J^p^du. 
Clearly /| G J-(p) for all z. Moreover we have d y (ff(y)p(y))\ y=x — l z (x)p(x) 
since J c l z (u)p(u)du = for c = a and c = b. Therefore f% satisfies, for all z, 
the so-called Stein equation 

T(f!, P )(x) = l z (x). (2.2) 

Hence we can use E [T(fi,p)(X)] = to deduce that P(X G (-oo,z\ nS) = 
P(Z < z)P(X G S) for all z, whence the claim. □ 

Theorem 2.1 encompasses Proposition 4 in [12] and Theorem 1 in [9] and is 
easily shown to contain many of the other better known Stein characterizations 
(such as the characterization of the semi-circular in [4]). We draw the reader's 
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attention to the fact that our way of writing the Stein operator (2.1) also shows 
that all Stein equations of the form (2.2) (that is, most such equations from the 
literature) can be solved by simple integration. Also, the form of our operators 
leads directly to our second main result. 

Theorem 2.2 (Factorization Theorem of Stein Operators). Let p and q be 
probability density functions in Q sharing support S . For all f G J~(p) H F{q), 
we have 

T(f, p) (x) = T(f, q) (x) + f{x)r{ Pl q) (x) , 

with 

r(p,q)(x) : = - + (6 {x=a} - 5 {x=b} )I s (x). 

Proof. The restriction on the support of q guarantees that we have f{y)p(y) = 
f{y)q{y)p(y)/q(y) for any real- valued function /. We can therefore write 

T(f,p)(x) = — y — 

p{x) 

_dy(f(y)q(y))\ y = x p(x) , ^ w ^Wy)/ g (y))|, 



f{x)q{x) 



p{x) q{x) p{x) 

= T(f, q)(x) + /(a) j|| d y (p( y )/q( y ))\ y=x . 

The claim follows. □ 

Note that, whenever S = R or S is an open interval, r(p,q) simplifies to 
p'/p — q'/Q- Now, let / be a real- valued function. In the sequel we will write 
Fi p [l(X)} := J m l(x)p(x)dx. Our next and final main result is immediate and 
hence its proof is left to the reader. 

Theorem 2.3 (Stein's method and information distances). Let p and q be prob- 
ability density functions in Q sharing support S . Let I be a real-valued function 
such that ~E p [l(X)] and F, q [l(X)] exist. Define ff to be the solution of the Stein 
equation 

T(f,p)(x) = (l(x) - E p [l(X)])l s (x) (2.3) 
and suppose that ff G J r (?)- Then 

E q [l(X)} - E p [l(X)} = E q [f[(X)r(p,q)(X)]. (2.4) 

Whenever p is well-behaved, the solutions to (2.3) are of the well-known form 

i r x 

/f : 1 -> R : i h> — - / (l(u) - E p [l(X)])p(u)du. (2.5) 

P\ x ! J a 

In cases such as the SC or the AS, the form of this solution (expressed in terms 
of f instead of /) is slightly different but easily provided, see Example 2.1 or 
equations (18) and (19) in Proposition 1 of [9]. 
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In all explicit instances covered in Example 2.1, the condition that ff £ 
is trivially verified (see page 4 in [2] for the Gaussian). Under moment 
conditions on p, Schoutens shows that members of the Pearson family satisfy 
this assumption as well (see [9], Lemma 1). 

3. Application 

Applying Holder's inequality to (2.4) shows that, under the same conditions, 
\E g [l(X)] - E p [l(X)}\ < K ^E q [(r(p,q)(X)n (3.1) 

with 

«? = V^,[(/fP0) 2 ]. 

Equation (3.1) provides some form of universal bound on differences of ex- 
pectations in terms of what can be likened to a generalized (standardized) Fisher 
information distance 

J(p,q) = E q [(r(p,q)(X)) 2 ] 

(the terminology and notations are borrowed from [1]). Note how, for instance, 
taking p — 4> the standard Gaussian density yields the Fisher information dis- 
tance studied, e.g., in [6]. 

Theorem 2.3 also provides a bound on any probability metric which can be 
written as 

d H (p,q) = sup \E q [h(X)]-E p [h(X)]\ (3.2) 

hen 

for some class of functions %. The Kolmogorov, Wasserstein and total variation 
distances, to cite but these, can all be written in this form. 

Specifying the target as well as the class H yields the following immediate 
corollaries. 

Corollary 3.1. Let p and q be probability densities with support S C K satis- 
fying the hypotheses in Theorem 2.3. Then there exist constants K\ := Ki(p,q) 
and K2 := K2{p, q) such that 

J \p(u) - q{u)\du < Hiy/Jip.q) 

and 

sup \p(x) - q(x) | < K 2 \/j(p,q). 

Proof. Take l(u) = I{ P («)< 9 («)} - !{?(«)>?(«)}• Using (2.4) with this choice of I 
and applying Holder's inequality, one readily sees that there exists a constant 
K\ > such that 

J \p(x) ~ q(x)\dx < K\y/J(p, q) 
where Kl = V^KTfOT 3 !- 
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Regarding the second inequality first note that, whenever x € S c , \p(x) — 
q(x)\ = 0, hence we can concentrate on the supremum over the support <S\ Now 
choose 2 (it) = S{ x =u} the Dirac delta function in For this choice of I we 

obtain after some computations 



\q(x) - p(x)\ < p(x)^E q [(l lx , b) (X) - P(X)f l{p{X)Y] ^J&qj, 

where P is the cumulative distribution function of the density p (for which 
evidently P(a) = 0). Taking the supremum yields the second constant ki- 

□ 

We conclude this paper with a computation of bounds on the constants «i 
and K2 for various examples. While these are somewhat related to the so-called 
"magic factors" appearing in the literature on Stein's method, the technique we 
employ to bound them is different and, we believe, of independent interest. To 
the best of our knowledge, such bounds were first obtained in [10] for Gaussian 
target only. Shimizu's results were later improved and extended in [5] and [6]. 
We recover in Corollary 3.2 below the best known values for k\ and our bound 
for K2 yields a significant improvement. We stress the fact that the results avail- 
able in the literature only concern a Gaussian target, whereas our approach 
allows to obtain such relationships for virtually any target distribution. Further 
explorations of the consequences of Theorem 2.3 also show that it is possible 
to relate Stein characterizations with other (pseudo-) metrics than those of the 
form (3.2), such as, e.g., Kullback-Leibler divergence or relative entropy (see [5]). 

Corollary 3.2. 

1. If p is the exponential distribution with rate 1, then n\ < 1. 

2. If p = (f) is the standard normal distribution, then K\ < 

3. If p is proportional to e~ x / 12 ; then k,\ < \J 2\/2. 

In all three cases we have Ki<\. 

Proof of the constants k 1 . Take l(u) = \ p ( u )<q(.u)} ~ h P (u)> q (u)} ■ Using (2.5) 
and the fact that J a (l(u) - E p [l(X)])p{u)du = 0, we obtain that 

til?) = — t~t /(*(«) - E p [l(X)])p(u)du 



P(x) J x 
2 <■<> 



P(x) J x 

2 * 



P(x) J x 



( 1 {p(«)<9(«)} ~ p p(p( x ) ^ q{X))p{u)du 
h{u)p{u)du, 



where P p (X e A) = J A p(u)du for some set A. Let p(x) = e x I[o t oo)( x ), the 
density of an exponential-1 random variable (in other words, a = and b = oo). 
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Recall that, in this case, the support of f[ is a subset of R + . Then we can write 

/>oo / />oo \ 2 

Ki := E q [(ff (X)) 2 } = 4 J q(x)e 2x (J h(u)e~ u duj dx 



< 4 / q{x)e 2x / h z (u)e- zu du dx 



<t I q(x)e 2x [ I h 2 [^)e~ u du)dx, 



2 



2.r 



where the first inequality follows from Jensen and the second inequality from 
a simple change of variables. Applying Holder's inequality and again changing 
variables in the above yields 



ki < - J J q(x)dxJ J q(x)e ix h 2 (-J e~ u du) dx 

/ poo / poo \ \ 1/2 

U ^GC **(!)•-*)*) ■ 



4 

- 



where J °° q(x)dx = 1 by our assumption that p and q share the same support. 
Iterating this procedure m € N times, we obtain 



4 



1/2" 



where M(m) = 1 + | + . . . + Now note that, for each m > 0, we have 
< h 2m+ \u/2 m+1 ) < 1. Hence 



Since M{m) — > 2, the result follows. 

If the support of p (and hence also of q) is the real line, we use similarly as 
above the identity J_ (l(u) — E p [l(X)])p(u)du = to write, equivalently, 



2 

p{x) J x '"v-vv-v"'"' y_ c 



/f (x) = 773- / h(u)p(u)du = — — J h(u)p(u)du. 



This yields 



/oo / -1 />oo \ 2 

q{X) \pW)J h ^ P ^ dU ) dX 



/ 1 /-a \ 2 



= 4 / — — / h(u)p(u)du dx 
\p(x) 



l( x ) / h(u)p(u)du dx. 

\P( X ) Jx 



2 
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Setting p{x) = (2,Tr)~ 1 ^ 2 e~ x I 2 we get by Jensen's inequality 



E q [(fi(X)) ] < 4 / q(x) / h 2 (u)e~ u du\dx 

J — oo \ J— oo / 

/•OO / />oo \ 

+ 4j q(x)(e x2 J h 2 (u)e- u2 du\dx=:r+I+. 

Both integrals above can be tackled in the same way as for the exponential 
distribution. Consider, for instance, I~ for which we can write (thanks to a 
simple change of variables) 



r = 4 



V2J- 

Now apply Holder's inequality to get 



q(x) [e x J h 2 (u)e- u duj dx 

q{x) e x2 / h 2 {u/V2)e- u2 / 2 du dx. 



r < 



< 




h 2 (u/V2)e- u2 / 2 du) dx 



q(x) e 2x / h 4 (u/V2)e- u du dx, 



where p — P q (X < 0). Changing variables once more yields 

I- < II 



with 



q{x) e 



2.i- 



(V2) 2 



( U 



V(V2) 2 

Iterating this procedure m € N times we deduce 



e~ u2/2 du dx 



with I m given by 



ip 



N(m) 



2Nim+l) 



q(x) e 2 



I- < Ii < ■ ■ ■ < I- 



(V2) m+1 a 



II 



(V2) 



m+l 



e.- u2/2 du dx 



where we set N(m) = 1 



• • • + 2^(= M(m) — 1). For every m we have 
< h 2m+1 (u/(V2) m+1 ) < 1 and 



r° ( AV2) m+1 x \ r— 

J q(x)ie 2mx2 J e- u2 l 2 du\dx<V q {X < 0W |. 
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Therefore 

1/2" 



-(P q (X < 0)) N ^ (p q (x<o)^j 



m — 2 N i m + 1 

Since N(m) — > 1 as m — > oo, we conclude 

/" < 2P q (X < 0). 

One can similarly show that I + < 2P q (X > 0), and the result follows. 

The computations for densities proportional to e~ x I 12 arc similar and are 
left to the reader. 

□ 

Proof of the constants Ki . Let p(x) = e~ x Imoo) ( x )i which readily implies P(x) = 
(1 — e~ x )l\p <00 ) [x). This leads to 



^^(Il-PtX)) 2 /^!)) 2 

) 

q{y)e 2 y{l [x ^ ) (y)-\ + e-yfdy 

pOO 

q{y)e 2y (l~e- y ) 2 dy+ / q{y)e 2y e~ 2y dy 



q(y)e 2y (l-2e- y )dy+ q(y)dy + q(y)dy 

JO Jx 

< l + e 2x {l-2e- x )V q {X <x), 

since e 2y (l — 2e~ v ) is a monotone increasing function on K + . This immediately 
yields 

k 2 < sup (e~ x ^l + e 2x {l - 2e- x )P q {X < x)\ , 

a quantity which can be bounded by 1. 

Now let p(x) = (2n)~ 1 / 2 e~ x and P(x) — the cumulative distribution 

function of the standard normal distribution. Similarly as for the exponential, 
we have 

E 9 [(l [Xi0o) (A:)-P(X)) a /(p(X)) 2 } 

pOO 

= 2n q(y)e y2 (l [x ., oo) {y)^^{y)) 2 dy 



= 2ir / q{y)e y (Hy)Ydy + 2ir / q(y)e y (1 - <S>(y)) z dy 

J — oo J X 

/x poo 
q(y)dy + 2ire x2 (l~$(x)) 2 q{y)dy 
-oo J X 

= 2ire x ' \^(x)) 2 + 2ite x \\ - 2<f>(x))P q (X > x). 
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This again directly leads to 



k 2 < sup ( (2iry 1/2 e- x2/2 J2Tre x2 ((<S>(x)) 2 + (1 - 2${x))P q (X > x) 



= sup (J($(x)) 2 + (1 - 2<f>(x))P q (X > x) 



a quantity which can be shown to equal 1. 

The computations for densities proportional to e 
left to the reader. 



x A /12 



are similar and are 
□ 
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